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I. INTRODUCTION 

In an earlier paper [T], hereafter referred to as Paper I, 
we presented a theoretical framework for analyzing snap- 
shots formed by scattering. In this paper, we demon- 
strate the power of this approach to reconstruct three- 
dimensional (3D) models and time-series from random 
sightings at extremely low signal, with no orientational 
or timing information. The theoretical framework in Pa- 
per I represents the information content of an ensem- 
ble of snapshots as a Riemannian manifold, and shows 
that the properties of operations in space give rise to 
object-independent symmetries. Purposeful navigation 
on this manifold is tantamount to reconstructing a 3D 
model of the sighted system and/or its evolution, in the 
sense that given any snapshot, any other can be pro- 
duced on demand. The symmetries of the manifold re- 
veal its natural eigenfunctions, thus allowing physically- 
based interpretation of graph-theoretic analysis, and en- 
hanced noise discrimination. Simple algorithms then suf- 
fice to reach exceptionally low signal-to-noise levels un- 
matched by other approaches in terms of computational 
cost, noise robustness, or both. As examples, we demon- 
strate structure recovery from radiation-sensitive objects 
at doses at least an order of magnitude below current 
levels (signal-to- noise ratio (SNR): —16 dB), and recon- 
struction of time-series at SNR values as low as —21 dB. 
The versatility of the approach is demonstrated in the 
context of simulated and experimental data from X-ray 



diffraction, cryo-electron microscopy (cryo-EM) , and op- 
tical snapshots using a variety of graph-theoretic tech- 
niques. These applications demonstrate the generality of 
the symmetry-based approach, elucidating at the same 
time the measures needed to deal successfully with ex- 
perimental data, a key benchmark of the practical utility 
of any theoretical framework. 



This paper is organized as follows. Without claim to 
be comprehensive, Sec. |ll] briefly summarizes previous 
work in the field to provide a context for the applica- 
tions discussed in this paper. For the convenience of 



non-mathematical readers. Sec. Ill provides a concep- 
tual outline of the theoretical framework developed in 
Paper I. Sec. |IV A] describes 3D reconstruction from sim- 
ulated diffraction snapshots of single biomolecules at the 
signal level expected from single molecules in upcoming 
experiments utilizing the new generation of X-ray Free 
Electron Lasers (XFELs) [2H11- Sec. |IVB| establishes, in 
principle, the applicability of our approach to crystalline 
samples. Sec. IV C addresses structure recovery from 
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simulated and experimental cryo-EM snapshots of sin- 
gle molecules. In this case, essential experimental issues 
such as defocus variation must be faced and incorporated 
into the theoretical formalism. Sec. IIVDI demonstrates 
reconstruction of time-series (movies) from random se- 
quences of ultralow-signal optical snapshots. The paper 
concludes in Sec. [V] with a summary of our key findings 
and their implications. Detailed points of a technical na- 
ture are elucidated in appendices, and movies provided 
as supplementary online material EPAPS [5]. 
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II. PREVIOUS WORK 

As described in Paper I, we are concerned with con- 
structing a model from random sightings of a system 
viewed in some projection, i.e., by accessing a hmited 
number of variables describing the state of the system. 
A 3D model of an object and its evolution, for example, 
can be constructed from an ensemble of low-signal 2D 
snapshots without orientational information [31 21 IHHS] ■ 
Modern graph-theoretic algorithms can now be used to 
discover low-dimensional manifolds representing the in- 
formation content of datasets in some high-dimensional 
space determined by the measurement apparatus |10H17j . 
The power of these methods stems from their generality, 
in the sense that no assumptions are made as to the na- 
ture of the data or their internal correlations. This brings 
with it four major challenges: (1) Interpretation of the 
analysis results ("what physical variables do the man- 
ifold dimensions represent?"); (2) Computational cost 
and scaling behavior on moving from simulated ("toy") 
datasets to experimental measurements; (3) Robustness 
against noise, particularly of non-additive, non-Gaussian 
types; and (4) Incorporation of inevitable and/or de- 
sirable experimental factors ("utility of the theoretical 
framework in practice" . ) 

These issues can be brought to focus in the context 
of the much-discussed problem of recovering the orien- 
tation of cryo-EM snapshots of faint biological objects. 
Direct graph-theoretic attempts to determine the orien- 
tation of snapshots from a synthetic object were aban- 
doned at a signal-to-noise ratio (SNR) of ~ 2 dB, even 
though only additive Gaussian noise was included |18j . 
Noting that graph-theoretic analyses often "fail to solve 
the cryo-EM problem, because the reduced coordinate 
system that each of them obtains does not agree with the 
projection directions" [1^, properties specific to cryo-EM 
images were used to extract information from the snap- 
shots. Graphs were then constructed using this informa- 
tion in order to assign physical meaning to the outcome of 
the analysis. Orienting low-signal cryo-EM snapshots by 
utilizing so-called (straight) common-lines identified pri- 
marily in simulated data with additive Gaussian noise has 
reached remarkably low SNR values [19]. However, such 
assumptions, while justified under some circumstances, 
are not generally valid. Common-lines, for example, are 
present only when elastic single-scattering dominates, are 
straight only when the wavelength of the incident radi- 
ation is so short that the Ewald sphere can be replaced 
by a plane, and are compromised by defocus variations 
essential for reliable structure recovery by cryo-EM I^H]. 

Symmetry-based assignment of physical meaning to 
the outcome of graph-theoretic analysis of scattering data 
and its favorable computational consequences were ad- 
dressed in Paper I. Here, we are concerned with abil- 
ity of this theoretical framework to deal with noise 
and other important factors encountered in experimen- 
tal datasets. This determines the practical utility of an 
approach as much as theoretical elegance and computa- 



tional efficiency. Below, we demonstrate the utility of 
our symmetry-based approach by applying a number of 
manifold-embedding techniques to a variety of simulated 
and experimental datasets (see Table IT]). 



III. CONCEPTUAL SUMMARY OF 
THEORETICAL FRAMEWORK 

A snapshot formed on a 2D detector by scattered ra- 
diation from an object can be represented by a vector, 
with the intensity values recorded at the n detector pixels 
as components (Paper I, Fig. 1). Object motion and/or 
evolution (dynamics) change the pixel intensities, caus- 
ing "the vector tips" representing the ensemble of snap- 
shots to trace out a surface — a manifold — in the n- 
dimcnsional data space. The number of degrees of free- 
dom available to the object determines the dimensional- 
ity of the manifold traced out. Rotations of a rigid object 
in 3D, for example, result in a 3D manifold. 

The data manifold represents the totality of informa- 
tion about the object gathered by the detector in the 
course of an observation. Learning is tantamount to un- 
derstanding the properties of this manifold sufficiently to 
"navigate" on it. Learning the manifold generated by ob- 
ject rotations, for example, is equivalent to constructing 
a 3D model of the object, because, starting from any 2D 
projection (point on the manifold) any other 2D projec- 
tion can be found by navigation, with the shortest route 
corresponding to a geodesic. 

As we are initially concerned with constructing 3D 
models from 2D snapshots, we consider a formulation 
of scattering by a single object in Fourier space so as to 
concentrate on the effect of rotations. This circumvents 
issues such as rigid shifts, which would otherwise have 
to be corrected or incorporated as additional manifold 
dimensions. Rotation operations do not commute. One 
must therefore consider the order in which they are per- 
formed. This leads to a distinction between so-called left 
and right "translations," where a rotation operator T is 
placed to the left or right of another rotation operator R, 
i.e., TR vs. RT. A left translation can be thought of as an 
active rotation in 3D space of the incident beam-detector 
arrangement (frame rotation) after R. Similarly, a right 
translation corresponds to an active rotation of the object 
(object rotation) before R. As TR ^ RT, left and right 
translations must be considered separately. Each forms 
an S0(3) group, and the total set of possible operations 
to be considered corresponds to S0(3) x S0(3). 

A key question is this: Which, if any, of these opera- 
tions leave the distances on the manifold unchanged, i.e., 
which operations are "invisible" to an ant crawling on 
the manifold? These operations would represent sym- 
metries — more precisely isometries — of the manifold. 
For a detector with circular symmetry, the distances on 
the manifold are invariant under beam-detector rotations 
about the beam axis. This is obvious; a frame rotation 
about the beam axis rotates all the snapshots by the same 
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amount about that axis without changing them. This 
leaves the distances on the manifold unchanged. The 
process of image formation on a circularly-symmetric de- 
tector at right angles to the illuminating beam thus has 
S0(2) isometry, i.e., of all possible S0(3) frame oper- 
ations, the SO (2) subset of rotations about the beam 
direction leave the distances on the manifold unchanged. 
This is related to the projection of a 3D object on the 2D 
detector, which is equivalent to a "central slice" through 
the diffraction volume in reciprocal space. 

Consider next the S0(3) set of operations correspond- 
ing to object rotations. It turns out that the metric mea- 
suring distance on the manifold can be decomposed into 
a homogeneous part, which varies uniformly with object 
rotation, plus a residual term, which acts as a fingerprint 
of the object (see Paper I Sec. Ill C). Considering the 
homogeneous part only, the total set of symmetries, is 
then SO (2) x SO (3). The same set of symmetries ap- 
pears in certain models of the universe in general relativ- 
ity [m 1211 , and is associated with well-known eigenfunc- 
tions familiar in the context of spinning tops in classical 
and quantum mechanics |23) . 

The key point here is that the knowledge of the man- 
ifold symmetries, which stem from the nature of opera- 
tions in space, allows one to determine the leading-order 
properties of the manifold under a very general set of 
scattering scenarios, including its natural eigenfunctions. 
Projection of noisy datasets on these eigenfunctions is 
tantamount to noise discrimination. The components of 
a data point representing a snapshot can then be directly 
related to its orientation (see Paper I Sec. Ill D). 



IV. APPLICATIONS 



It has long been known that the use of problem-specific 
constraints can substantially increase computational ef- 
ficiency 124] . By combining wide applicability with class 
specificity, symmetries represent a particularly power- 
ful example of such constraints. In Paper I, we used 
the object-independent symmetries of image formation 
to recover 3D structure from a large ensemble of simu- 
lated, noise-free diffraction snapshots with a computa- 
tional complexity 10* x higher than the state of the art. 
Here we demonstrate the noise robustness stemming from 
exploiting the symmetries of image formation. Exam- 
ples include orientation recovery, 3D reconstruction, and 
movie extraction from ultra-low-signal diffraction or im- 
age snapshots of periodic and non-periodic objects and 
dynamical systems. Each example was selected to high- 
light an important application area. As shown in Table|Tj 
a variety of manifold-embedding techniques can be used. 



A. Structure recovery from simulated diffraction 
snapshots of non-periodic objects at ultra-low signal 

First, we demonstrate 3D structure recovery from a 
collection of two million simulated diffraction snapshots 
of the synthetic protein chignolin (Protein Data Bank 
(PDB) descriptor: lUAO, model 1) at 4 x 10^^ scat- 
tered photons per Shannon pixel at 1.8 A. (A Shannon 
pixel is of the size needed for appropriate sampling of 
the intensity distribution as prescribed by the Shannon- 
Nyquist theorem.) This scattered intensity is expected 
from a 500 kD protein exposed to a single pulse from 
an XFEL [H [3]. At this signal level, Poisson (shot) 
noise dominates. The ability to deal with such levels 
of non-additive noise was previously demonstrated only 
by Bayesian algorithms [31 |H] with extremely unfavor- 
able scaling behavior [T] |31 IH [5] , restricting the size of 
amenable objects to eight times the spatial resolution. 

Here, we use the symmetry-based approach described 
in Paper I after modest denoising. The denoising scheme 
consists of two steps: (1) Convolve the snapshot pixels 
with a 2D Gaussian filter with a width approximately 
equal to that of a Shannon pixel; (2) Replace each snap- 
shot vector by an average over its local neighbors. De- 
pending on the SNR, a number iterations of step (2) may 
be needed, with a stopping criterion based on a least- 
squares residual determined through the first nine Lapla- 
cian eigenfunctions of the dataset (ordered in order of in- 
creasing eigenvalue). These eigenfunctions are employed 
in our scheme to assign an orientation to each snapshot. 
For details see Appendix \X\ and Paper I. 

To estimate the accuracy of orientation recovery, we 
use the following measure for root-mean-square (RMS) 
distance between the deduced and true orientations: 
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where Dij — 2arccos(|ri'r, |) and Dij = 2 arccos(|fi • fj |) 
are the true and estimated internal distances between 
orientations i and j, respectively, and • is the inner prod- 
uct between quaternions. Moreover, to assess the influ- 
ence of local averaging on the eigenfunctions employed 
for orientation recovery, we compute the distance 7 of 
the invariant subspace V spanned by the leading nine 
eigenfunctions of the diffusion matrix in Table [V| from 
the corresponding invariant subspace V of the noise-free 
diffusion matrix [53]. Note that has size s x s, where 
s is the number of snapshots in the data set; i.e., V and 
V are subspaces of . 

Here, we employ a standard distance measure from 
matrix perturbation theory 26 , viz. 
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where U and 77 are orthogonal projectors from W to V 
and V, respectively, and ||-||2 denotes the spectral norm 
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TABLE I. Summary of applications. For DifTusion Map see Refs. [T1[T5], Isomap Ref. [10], GTM Refs. [SUTT]. 



Data type 


Observed system 


Snapshot type 


Reconstruction 


Manifold-embedding technique 


Simulated 


Adenylate kinase molccultpj 
Chignolin molecule 


Diffraction 
Diffraction 


3D structure 
3D structure 


Difl'usion Map 
Diffusion Map 


Experimental 


Superoxide dismutasc-1 crystal 
Chaperonin molecule 
Pirouette 
Pas dc deux 


Diffraction 
Cryo-EM images 
Unsorted image frames 
Unsorted image frames 


Orientation recovery 
3D structure 
Time series 
Time series 


Isomap 
GTM 

Diffusion Map 
Isomap 



^ see Paper I Sec. IV A. 



of matrices. With this definition, 7 Ues in the interval 
[0,1], and may be interpreted as the sine of an angle 
characterizing the deviation of V from V. For our pur- 
poses, Eq. ([2]) is more appropriate than an error measure 
based on the difference between the noisy and noise-free 
diffusion matrices (or their generators), since the latter 
depends on higher eigenfunctions which are not used in 
our scheme. 

Diffraction snapshots were simulated in 2 x 10® differ- 
ent orientations to a spatial resolution of 1.8 A using 1 A 
photons. The orientations were sampled approximately 
uniformly over SO (3), as described in Ref. |27j . Cromer- 
Mann atomic scattering factors [2Sj were used for the 77 
non-hydrogen atoms, and the hydrogen atoms neglected. 
The detector pixel was the appropriate Shannon pixel [3] , 
which oversamples the scattered amplitudes by a factor 
of two, resulting in 40 x 40 = 1600 Shannon detector 
pixels. To model shot noise, diffracted intensities were 
scaled so that the mean photon count (MFC) per Shan- 
non pixel was 0.04 at 1.8 A resolution. The quantized 
photon count at each pixel was obtained from a Poisson 
distribution by the algorithm described in Ref. |29) . 

With no other information, the noisy diffraction pat- 



terns were provided to the algorithm in Table III (width 
of Gaussian filter for image smoothing a = 0.7; num- 
ber of nearest neighbors in the sparse distance matrix 
d = 220; number of nearest neighbors for local averag- 
ing / = 20; number of datapoints for least-squares fitting 
r = 8 X 10"*; number of nearest neighbors for autotun- 
ing n = 30.) As illustrated in Fig. [l] the least-squares 
residual Q* , the subspace distance 7, and the RMS orien- 
tation recovery error e all decrease monotonically for the 
first five iterations of local averaging, but exhibit a mild 
increase at iteration six. At that point the algorithm was 
terminated in accordance with the stopping criterion de- 
scribed above and in Appendix [A] The minimum e value 
attained with this choice of parameters at iteration 5 is 
~ 1.1 Shannon angles. We measured comparable levels 
of orientation-recovery accuracy for various combinations 
of I and n parameters in the range 10-50. In all cases, we 
observed that small values of G* correlate strongly with 
small values of e, indicating that the least-squares resid- 
ual provides an effective guideline for setting the param- 
eters of the algorithm. This is particularly important, 
because G* depends solely on the Laplacian eigenfunc- 
tions (see Table IV), and, unlike e, can be evaluated in 
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Local-iiveraging iteration 

FIG. 1. (a) Least-squares residual Q* , (b) invariant-subspace 
distance 7, (c) RMS internal distance error e, shown as a 
function of the local-averaging iteration count. In Panel (a), 
Q* has been normalized by the number of samples r = 8 x 10* 
used for least-squares fitting. 



an experimental environment where the correct orienta- 
tions are not known. 

The quality of orientation recovery was further tested 
by inverting the reconstructed 3D diffraction volume 
compiled on a uniform Cartesian grid by an interpolation 
scheme consistent with the geometry of diffraction [30) . 
The i?-factor between the gridded scattering amplitudes 
Fi and those obtained from the Fourier transform of the 
recovered electron density from phasing, Fi, was defined 
as 
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FIG. 2. Three-dimensional electron density of the synthetic 
protein chignolin, recovered from 2 x 10^ noisy diffraction 
patterns of unknown orientation at a mean photon count of 
0.04 per pixel at 1.8 A resolution. The ball-and-stick model 
represents the actual structure. 

The 3D electron density obtained by iterative phasing 
with the SUPERFLIP algorithm 31j {R — 0.20) is shown 
in Fig. [2j The close agreement with the known structure 
of chignolin clearly demonstrates sufficient alignment ac- 
curacy for reconstruction to 1.8 A resolution. This is 
on par with the computationally much more expensive 
Bayesian approaches [31 [HI [35] . 

B. Orienting diffraction patterns of crystals 

So far we have shown that our symmetry-based ap- 
proach can be used to orient diffraction patterns from sin- 
gle molecules to high accuracy. We now demonstrate that 
this approach can also orient diffraction snapshots from 
crystals. This is important, because recent XFEL-based 
"diffract-and-destroy" approaches, which use femtosec- 
ond X-ray pulses to "outrun radiation damage" , produce 
diffraction snapshots of nanocrystals of unknown orien- 
tation f33]. As a representative example, we consider a 
biological crystal of the enzyme superoxide dismutase-1 
(SODl, PDB designation: 1AR4) with ~ 3 x 10^ atoms 
per unit cell, and thus highly complex diffraction pat- 
terns. The key issue is whether manifolds produced by 
diffraction snapshots of crystals are sufficiently homo- 
geneous (possess sufficiently homogeneous metrics) for 
snapshot orientations to be recovered in a straightfor- 
ward manner. To demonstrate this point, we intention- 
ally utilized snapshots spanning an orientation range of 
90° so as to produce an open ID manifold, and ana- 
lyzed the dataset with the Isomap manifold-embedding 
method [10]. In contrast to Diffusion Map, whose eigen- 
functions are insensitive at the boundaries, Isomap maps 
a ID open manifold to a straight line segment, and is 




(a) (b) (c) 



FIG. 3. (a) Typical experimental diffraction pattern of su- 
peroxide dismutase-1. (b) The embedding of the geodesic 
distances in the space of the first three eigenvectors, with each 
point representing a diffraction pattern, (c) Relation between 
the correct and the determined orientations. 

sensitive to snapshot orientation over the entire range. 

Experimental diffraction patterns of a single crystal 
of superoxide dismutase-1 with a mosaicity of 0.8° were 
obtained at the Advanced Photon Source (A = 0.98 A). 
The crystal was rotated about an arbitrary axis with a 
step size of 0.5°, and 1800 diffraction patterns recorded 
over a range of 90°. To compensate for spurious beam 
intensity fluctuations, the diffraction pattern intensities 
were normalized. Isomap was used to embed diffracted 
amplitudes (square-roots of intensities) , using two near- 
est neighbors for calculation of geodesic distances (in- 
tegrals of the metric). As shown in Fig. [3] a one- 
dimensional and uniformly populated manifold results, 
with the projection on the first eigenvector linearly pro- 
portional to the snapshot orientation to within 1°, com- 
pared with the crystal mosaicity of 0.8°. The homogene- 
ity of this manifold (metric) establishes that, in principle, 
our symmetry-based approach can be used to treat crys- 
talline objects in the same way as non-periodic single 
particles, provided, of course, object symmetry is appro- 
priately incorporated. 



C. Structure recovery from experimental 
cryo-electron micrographs 

A well-studied application of graph-theoretic tech- 
niques concerns the 3D reconstruction of faint biological 
objects by single-particle cryo-EM without orientational 
information. In cryo-EM, the resolution is strongly de- 
graded by radiation damage. As such, the lowest ac- 
ceptable exposure to electrons and thus SNR must be 
employed. As described in Sec. [ll| this has proved a 
fertile ground for testing new algorithms. By recourse 
to specific properties of cryo-EM images, impressive re- 
sults have been obtained, primarily with simulated data. 
Beyond noise, however, reconstruction by cryo-EM must 
contend with a range of key issues, chief among them the 
loss of information due to zero-crossings in the transfer 
function of the microscope and thus partial loss of infor- 
mation in any single snapshot. The exact position of the 
zero-crossings depends sensitively on microscope defocus. 
This offers a means to recoup some of the lost informa- 
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tion by insuring that the dataset includes micrographs 
obtained over a range of defocus values, each with a dif- 
ferent set of zero-crossings in the transfer function. The 
key point is this: the object structure cannot be recovered 
in full detail from a single defocus, even if the imaging 
parameters were known exactly. Thus, for a reconstruc- 
tion algorithm to be of practical use, it must deal with 
the effect of defocus variations — a test rarely passed by 
new algorithms. Here, we demonstrate structure recov- 
ery from experimental cryo-EM images of the biological 
molecule chaperonin. Specifically, we incorporate the ef- 
fect of defocus, use the symmetry-based homogeneity of 
the manifold metric to deduce orientations, and thence 
recover the 3D object structure. This demonstration is 
mitigated by two factors: (1) in order to expedite the 
calculations, snapshots with only one orientational de- 
gree of freedom were selected from a set presorted by a 
standard orientation algorithm; and (2) to demonstrate 
structure recovery at ultra-low signal — far below what 
is normally used — experimental snapshots were prepro- 
cessed to simulate such low signal levels. 

Randomly oriented single-particle cryo-EM images im- 
ages of the wild-type group II chaperonin in methanococ- 
cus maripaludis (Mm-cpn), obtained with a mean inci- 
dent electron count (MEC) of 20/ A (equivalent to 135 
electrons per 2.6 A-square snapshot pixel) were kindly 
provided by Chiu et al. [Mj. Each snapshot consisted 
of 96 X 96 pixels. A set of 413 side-view snapshots was 
selected from 5000 images, whose orientations had been 
previously determined by the eman program [35) , result- 
ing in a dataset with a single orientational degree of free- 
dom about the object symmetry axis. 

To investigate the performance of our method at lower 
dose, a second data set was produced by applying an ad- 
ditional Poisson process to the raw experimental images. 
The method is based on an approximation valid for low- 
contrast images with Poisson noise and sufficiently large 
MEC. The substitution 1^1' = Pois{I^^^) transforms 
a signal / to a signal with mean MEC' = MEC^/^ 
and variance var(/') = var(J)/4. Simulations verified the 
accurate validity of this approach at MEC = 100, com- 
pared with an MEC per snapshot pixel of 135 for our 
experimental images. Twenty noisy versions of each im- 
age were thus generated to form a data set of 8260 images 
with an effective MEC of 1.7 per A^ . 

Since neither the noise-free signal nor the noise vari- 
ance was known for our experimental cryo-EM images, 
a method developed by Frank [50] was used to estimate 
the SNR directly from the experimental data. This de- 
termines the SNR from the cross-correlation coefficient 
Cij between two images in the same orientational class 
using the definition: 

SNR = lOlogio mean(a,/(l - ^j)), (4) 

where the mean is taken over all classes and all images 
within each class. Provided two images represent differ- 
ent realizations of noise from an identical object in the 



same orientation, the above estimate for SNR agrees with 
the standard definition SNR = 10 log, „ m. 
With the classification obtained from EMAN [3S] , and the 
assumption that class members differ only in noise, we 
estimate a SNR of —6 dB for the raw experimental snap- 
shots (MEC: 20/A^) and a SNR of -16 dB for the pre- 
processed experimental snapshots (MEC: 1.7/A^). 

As described above, the defocus value and hence 
transfer function of the electron microscope vary from 
snapshot to snapshot. To analyze such cryo-EM data, 
we implemented a modified version of the manifold- 
embedding algorithm Generative Topographic Mapping 
(GTM) [T71 [37] to explicitly incorporate the effect of the 
microscope transfer function. GTM defines a manifold in 
data space by partitioning the noisy dataset into a num- 
ber of Gaussians each centered around a point (node) on 
the manifold. The partitioning is based on a nonlinear 
mapping of a latent space, in this case the space of ro- 
tations. GTM is thus, in essence, a manifold-embedding 
technique, with the symmetries of scattering manifested 
in the homogeneity of the data manifold, as described 
in Paper I. However, the generative capability of GTM 
allows one to construct an image (in essence a model 
snapshot) at each node on the data manifold. In our 
approach, this model image extracted from the data cor- 
responds to the aberration-free projected potential of the 
object. In order to assign an experimental snapshot to 
a model image, its distance from the model is calculated 
after convolving the model with the transfer function of 
the microscope at the defocus corresponding to that of 
the experimental snapshot. This convolution proceeds 
efficiently as multiplication in Fourier space, and is not 
computationally expensive. A similar approach based 
on more efficient manifold-embedding techniques will be 
published elsewhere. 

The GTM-based approached was first validated with 
simulated cryo-EM images of chaperonin over a typi- 
cal experimental defocus range of 10,000 A to 30,000 A 
( under f o cus ) . The orientations were successfully recov- 
ered to within 1°. To reconstruct 3D density maps from 
experimental snapshots, model aberration-free projected 
potentials were generated (lifted) at 16 equally-spaced 
nodes of the data manifold produced by experimental 
images replicated according to the 8-fold object symme- 
try. 3D density maps were then reconstructed tomo- 
graphically using the back-projection algorithm BG CG 
of the SPIDER software package [3S]. For comparison, 
a simulated density map was obtained from 2D snap- 
shots using the known chaperonin atomic coordinates 
(PDB identifier: 3L0S) under the following imaging 
conditions: spherical aberration Cg ~ 4.1 mm; defo- 
cus Af = 24,000 A (underfocus); electron energy E = 
300 keV; damping envelope parameter _B = 50 A ; im- 
ages phase-flipped. The resulting 3D density map was 
passed through a 5 A Gaussian filter. 

Fig. pfa) shows a typical experimental snapshot. 
Fig. W[h) the average of the micrographs assigned to an 
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orientation class by the cryo-EM reconstruction software 
package eman [35], and Fig.|4]jc) the snapshots oriented 
by manifold embedding and reconstructed (lifted) from 
the manifold. (For a movie of the reconstructed tilt series 
see EPAPS Movie 1). Note that the manifold is able to 
generate missing images by interpolation. The improved 
quality of the manifold-generated snapshots compared to 
the class averages offers the possibility to reconstruct at 
significantly reduced dose. Fig. Qd) is an experimen- 
tal snapshot preprocessed to approximate snapshots ex- 
pected from a single chaperonin molecule at a dose 12 x 
lower than commonly used j34l(SNR ~ —16 dB, i.e., 10 
dB below a typical dose). Fig. Qe) is the snapshot lifted 
from the manifold after orienting an ensemble of 8000 dif- 
ferent raw snapshots by manifold embedding. It is clear 
from this image, the corresponding tilt series (EPAPS 
Movie 2), and the 3D reconstructions of Fig.|4]jf-h) that 
snapshots can be successfully oriented by manifold em- 
bedding to produce 3D models, even at 12x lower signal 
than in use today. Note that images at this dose could 
not be oriented by standard cryo-EM approaches [38] . 
even when accurately centered prior to analysis, as was 
performed here. In contrast, our orientation recovery re- 
sults were similar to those obtained at an MEC of 20/ A , 
indicating that the effect of lower signal levels can be 
compensated by increasing the number of snapshots. 

Our results thus offer the tantalizing possibility of re- 
ducing the snapshot dose in 3D reconstruction techniques 
using ionizing radiation, in some cases by at least an or- 
der of magnitude. This would significantly mitigate the 
limits set by radiation damage. As a benchmark, the 
essentially unfulfilled promise of the costly transition of 
cryo-EM to liquid He temperatures was aimed at improv- 
ing dose tolerance by a factor of two. The superior signal 
extraction capability offered by manifold mapping could 
also be used to obtain images at smaller defocus values 
in order to reconstruct the object to higher resolution. 



D. Time-series (movies) from ultra- low-signal 
random-sequence snapshots 

Our knowledge of the precise time at which an ex- 
perimental snapshot of a dynamic system was obtained 
is corrupted by inevitable uncertainties, which can sub- 
stantially exceed the intrinsic time resolution of the ob- 
servation technique. Modern pump-probe experiments, 
for example, can now be performed with pulses as short 
as a few femtoseconds, but their time-resolution is often 
determined by timing jitter, which can be up to two or- 
ders of magnitude larger [33 HO] . When the state of a 
system under observation is not synchronized with the 
observation windows, a sequence of snapshots can repre- 
sent random sightings of the system during its evolution. 
It is thus important to develop means for deducing "time 
stamps" directly from the data, either to reduce jitter- 
induced uncertainty, or to order a sequence of snapshots 
according to the intrinsic evolution of the system un- 




FIG. 4. (a) Experimental cryo-electron micrograph of a chap- 
eronin molecule at a mean electron count of 20/A (SNR = 
—6 dB). (b) Image obtained by averaging the members of 
an orientation class, (c) Image generated (lifted) from the 
data manifold, (d) Experimental micrograph of chaperonin 
molecule processed to reflect a mean electron count of 1.7/ 
(SNR = — 16 dB). (e) Image lifted from the data manifold, 
(f) 3D reconstruction with simulated images, (g) 3D recon- 
struction with lifted images at a snapshot dose of 20/ A^. (h) 
3D reconstruction with lifted images at a snapshot dose of 
1.7/r. 



der observation. Here, we demonstrate this capability 
at SNRs as low as —21 dB. Specifically, we show that: 
(1) Randomized movie sequences can be time-ordered, 
even when the signal is extremely low; and (2) Frames 
generated (lifted) from the manifold produce movies of 
superior quality. 

Movies of a pirouette and a pas de deux were down- 
loaded from the web. These represent optical snapshots 
of a conformationally rigid body in rotations, and the 
evolution of two flexible bodies in interaction, respec- 
tively. In order to reduce the SNR, a constant back- 
ground was added, and shot noise incorporated at each 
pixel depending on its intensity value, as described in 
Ref. [33]. For the pirouette, a sequence of 16 turns con- 
sisting of 268 frames (210 x 160 pixels each) was replicated 
132 times, a background 5x the mean intensity added, 
and shot noise incorporated to produce an effective mean 
photon count per pixel of 0.08 and a SNR of —21 dB (see 
Eq. |4| . For the pas de deux, a sequence of 870 frames 
(265 X 305 pixels each) was replicated 12 times with an 
added background of twice the mean intensity, and shot 
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4: 
5: 
6: 

7: 
8: 
9: 

10 
11 
12: 
13 
14 
15 



TABLE II. Manifold-lifting algorithm based on GTM 

Inputs: 

Noisy snapshots J\4i — {LiJ ■ ■ ■ ^ Lg} 

Estimated quaternions 1~ — {ti , . . . . Tg} 

number of nodes K 

number of basis functions AI 

basis function width cr 

Outputs: 

Manifold-lifted images A4 — {ai- ■ ■ ■ ,^^} 

Generate the grid of latent points {xi, . . . , xk}- 

Generate the grid of basis function centers . . . 

Compute the matrix of basis function activations such that 

^,^{x) ^ CKp{-{x - flrr^f /2a'^). 

Initialize a set of weights W using principal component analysis. 
Initialize inverse Gaussian noise variances a and /3. 
Compute a set of responsibilities R that assigns each snapshot to a 
node from the results of manifold embedding. 

Compute the diagonal matrix G using R, where G^k — X^i=i ^ki- 
repeat 

W -f- {^^G4>-}-XI)~^^^ RA4i, where the rcgularization param- 
eter A may be zero. 

Compute A, where Akn — \\Ln ~ ^kW\\^. 

Compute 7 from A and a. 

Update a and /3 using 7, R and A. 
until convergence of W 
M ^ 
return A4 
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FIG. 5. Top row: First five frames of 35,000 randomly- 
sequenced snapshots of a pirouette preprocessed to reflect a 
mean photon count of 0.08 per pixel with added background 
and shot noise, corresponding to a signal-to-noise ratio of —21 
dB. Bottom row: Five evenly-spaced images extracted from 
the Diffusion Map manifold. (See also EPAPS Movie 3.) 




noise incorporated to produce a mean photon count of 0.8 
and a SNR of —11 dB. For both movies, camera motion 
was corrected by reference to a stationary marker. 

Each random sequence was ordered by a suit- 
able manifold-embedding technique (Diffusion Map or 
Isomap). Using the generative property of GTM, im- 
ages were then lifted from the manifold. As described 
in Ref. Hj and demonstrated in Fig. |4]^c,e), this proce- 
dure uses the information content of the entire dataset to 
generate each snapshot, producing images of significantly 
higher quality than possible by traditional classifying and 
averaging techniques. It is also more robust against non- 
uniform sampling and jitter. Table [TT] summarizes the 
lifting procedure. 

For the pirouette. Diffusion Map was used to recover 
the object orientation in each frame during the dance, 
and hence the sequence order (number of nearest neigh- 
bors in the sparse distance matrix d — 5896; Gaussian 
kernel bandwidth e ~ 200.) Snapshots were then lifted 
from the manifold (number of nodes K = 28; number 
of basis functions M = 14, basis function width a = 2.) 
Reconstructed images are shown in Fig.[5]together with a 
sequence of unsorted, unprocessed snapshots. The movie 
is available as EPAPS Movie 3. The randomized pas de 
deux sequence was ordered with Isomap (number of near- 
est neighbors d — 33.) To compile the movie, images were 
lifted from an ordered sequence of 870 points (nodes), 
corresponding to uniform sampling on the Isomap mani- 
fold. Reconstructed images are shown in Fig. [6] together 
with a sequence of unsorted snapshots. The movie is 
available as EPAPS Movie 4. 

Figs. [5] and |6j and the associated movies clearly show 
that our approach is able to determine the correct frame 
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FIG. 6. Top row: First four frames of 10,000 randomly- 
sequenced snapshots of a pas de deux preprocessed to reflect 
a mean photon count of 0.8 per pixel with added background 
and shot noise, corresponding to a signal-to-noise ratio of —11 
dB. Bottom row: Four evenly-spaced images extracted from 
the Isomap manifold. (See also EPAPS Movie 4.) 



sequence and generate high quality snapshots at signal 
levels as low as 0.08 photon/pixel for the pirouette (mod- 
ulo one revolution), and 0.8 photon/pixel for the pas 
de deux, both with added background and non-additive 
noise corresponding to signal-to-noise ratios in the range 
— 11 to —21 dB. These examples demonstrate the ca- 
pability to determine the time evolution of systems from 
unsorted random sightings at extremely low signal. They 
also highlight the potential to correct timing jitter in 
pump-probe experiments, and reconstruct the evolution 
of dynamic systems from random sightings of members 
of a heterogeneous set, each at a different stage of its 
evolution. These possibilities will be described in detail 
elsewhere. The general implications for signal extraction 
and image processing are clear. 
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V. CONCLUSIONS 

We have shown that manifold mapping, as described 
in Paper I, augmented with modest noise reduction mea- 
sures, is able to extract structural and timing informa- 
tion from simulated and experimental snapshots at ex- 
tremely low signal. The ability to orient simulated and 
experimental diffraction and image snapshots confirms 
the accessibility of the homogeneous manifold expected 
from our theoretical framework for a wide range of ob- 
jects and imaging scenarios, including crystalline sam- 
ples. The capability to recover 3D structure at extremely 
low signal is on par with the more expensive Bayesian ap- 
proaches, but offers greater reach in terms of sample size 
and resolution, as demonstrated in Paper I. The noise- 
robustness of our approach substantially exceeds what 
has been demonstrated with comparable graph-theoretic 
approaches without restrictive, application-specific as- 
sumptions. The manifold itself offers a powerful route 
to image reconstruction at low signal, because snapshots 
reconstructed from the manifold achieve higher signal-to- 
noise ratios than possible by traditional approaches based 
on classification and averaging. Taken together, these 
offer a physically-based, computationally efficient, noise- 
robust route to analyzing the large and varied datasets 
generated by existing and emerging structure recovery 
methods. In the longer term, it should be possible to 
use these approaches to recover or improve timing in- 
formation in pump-probe experiments, and construct 3D 
movies (4D maps) from random sightings of members of 
structurally heterogeneous and dynamically evolving en- 
sembles. 



sured pixel amplitudes, viz. 

Oj i-> a,j = + (5a j, (Ala) 

Sa^ = {Saii,Sai2, . . . , Sa^n)'^ , Suij = llj"^ - Ka^j . 

(Alb) 

This causes the observed, noisy, dataset 

■M = {ai,a2,...,aj (A2) 

not to lie exactly on the manifold A4 (up to a global 
scaling by k). One would expect that if the perturba- 
tion norm \\Sa^\\ becomes comparable to the kernel band- 
width e^/^, the computed eigenfunctions tp^ are distorted 
to the point that the embedded manifold no longer has 
the topology of SO (3) [151 SI]- Indeed, as illustrated 
in Fig. [T] direct application of the algorithm for noise- 
free data (see Paper I Table III) to a noisy dataset Ai 
at an MPC = O(10~^) results in poor accuracy. In or- 
der to be practically useful, the noise-free orientation- 
recovery scheme must be augmented by a suitable de- 
noising method. 

Conceptually, we denoise the data in three steps: 
(1) Low-pass filtering of each snapshot by convolution 
of pixel intensities with a 2D Gaussian; (2) Variance- 
stabilizing transformation (VST); and (3) Local averag- 
ing over nearest neighbors in data space prior to embed- 
ding. In practice, we combine (1) and (2), known to be 
effective for shot noise [i^l - Hij . into a single step, and 
follow the iterative procedure described below. 

1. Low-pass Gaussian filtering and 
variance-stabilizing transformation (VST) 
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Appendix A: Treatment of noise 

In the manifold picture, noise can be described 
as a perturbation of the noise-free manifold Ai = 
{ai,a2T ■ ■ y^s}i where Oj is a snapshot vector of mea- 



Convolution with a low-pass Gaussian filter of band- 
width a is represented by: 

I^{I*H„){r) = [ df I{f')H„{f-r'), (A3a) 



with 



H^if) = exp(-|lrll V2a2)/(2^a2)i/2. (A3b) 



VST, proposed by Guan [44 for low- intensity data, can 
be written as: 



(A4) 



I{r) denotes the (discretely sampled) intensity pattern on 
the detector plane obtained by "unpacking" the column 
vector of intensities /. 



Eqs. (A3) and (A4| are combined into a single opera- 
tion: 



VST(/; a) = (/ * H^y/^ + [(/ * H,) + l]i 



/2 



(A5) 



Given an intensity-pattern dataset Mj consisting of s 
samples and an index matrix of nearest neighbors N of 
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dimensions s x /, we introduce a combined VST and 
aggregation operation taking A^/ to a dataset Ai = 
VSTL{Mi;a,N) such that 



= VST V/„ -a . (A6) 



Noise robustness can be further enhanced by a so-called 
self-tuning Gaussian kernel introduced by Zelnik-Manor 
and Perona [45]. Here, instead of the isotropic Gaussian 
kernel JC{a^,aj) = exp(— ||aj — a^W^/e) of Eq. (B3) in 
Paper I, one uses an anisotropic Gaussian kernel with 
local scaling parameters, e^, given by 



K^{a„aj) = exp(-||a, - aj-||V(e,ejy/^) 



(A7) 



A canonical choice for the scaling parameters, which we 
adopt throughout, is et = ||aj — a^v., IP: where, as usual. 
Nil denotes the index of the l-th nearest neighbor of dat- 
apoint i. 



2. Iterative local averaging 

If the true nearest neighbors of a point in data space 
are known, and noise produces no systematic bias, lo- 
cal averaging approaches the true manifold. To see this, 
let s be an error tolerance in data space, and consider 
a reference orientation R with corresponding noise-free 
snapshot a. For any e > it is possible to find a ball 
in data space centered at a, such that for any countable 



set {c 



,ai} of noise- free snapshots lying in i?^ the 



mean, a = J2i=i^i/^' error ||a — a|| < e. In the pres- 
ence of noise, the are replaced by the random variables 
in Eq. ( AlaP ; i.e., i— a^, where are statistically in- 



dependent, have expectation value k ' proportional to 
flj, and finite variance Aaf. Moreover, the sample mean 
within Bf, becomes a random variable a = ^i/^ with 

expectation value K^/^a and variance Aa^ — X]i=i ^Qii/l- 
By the law of large numbers, in the limit of an infinite 
data set with infinite snapshots in (i.e., / oo), is 
equal to (up to an unimportant proportionality con- 
stant) with probability one. Thus, recovery of the data 
manifold with error e is possible almost surely. 

In practice, noise corrupts the local neighborhood rela- 
tions, and without a priori information, it is not possible 
to identify which of the snapshots in a noisy data set are 
associated with the ball B^ of the underlying noise-free 
system. Therefore, it cannot be guaranteed that local 
averaging leads to the correct manifold. We therefore 
exploit our knowledge of the natural eigenfunctions of 
scattering manifolds to monitor the effect of local av- 
eraging, and terminate the procedure before substantial 
deviations have occurred. 

Specifically, we follow the algorithm described in Ta- 
ble III First, we apply the VST operation (A5) to the 



intensity data {/j} = Mi, setting the filter width a to 
a relatively small value (e.g., in Sec. IV A cr is set to 



6 
7 
8 
9 

10: 

11: 

12: 
13: 
14: 
15: 
16 

17: 
18: 
19: 



TABLE III. Orientation-recovery for noisy snapshots 

Inputs: 

Noisy snapshots A4j — {/^ , . . . , /^} 
Number of retained nearest neighbors d 
Number of nearest neighbors for local averaging 
Number of datapoints in the least-squares fit, r 
Number of nearest neighbors for autotuning, n 
Gaussian filter bandwidth a 

Outputs: 

Estimated quaternions 7~ — {ri , . . . , } 
Estimated nearest-neighbor index matrix N 
Least-squares residual Q* 

for i — 1, . . . , s do 

a, ^ vsT(/,;(j) 

end for 

Aio ^ {a^} > initial iterate for Diffusion Map input data 

Execute the algorithm in Table [iv| with input data Alo; store the 
returned nearest-neighbor index matrix as No and the least squares 
residual as C/,* . 

i 1 initialize iteration counter. 

terminate false > initialize termination flag, 

while terminate = false do 

Mi -i— VSTL(A^/; cr, Ni_i) > current iterate for Diffusion Map 

input-data 

Execute the algorithm in Table llVJ with input data store 
the outputs as 7i, Ni, and Q* . 

terminate Q* > > terminate if the residual has 

increased. 

if terminate -. 

i <- i + 1 
end if 
end while 

T -i— Ti-i > set outputs to the values corresponding to minimum 
residual. 



false then 



[> increment iteration counter. 



N 



return 7~, G* , N. 



7/10 of a pixel width). The autotuning version of the 
orientation-recovery method (Table [lV| is executed using 
the VST-filtered intensities as input data. The nearest- 
neighbor indices Nq obtained in the course of the calcula- 
tion of the sparse distance matrix then become our initial 
estimate for the true nearest-neighbor indices. We also 
record the residual of the nonlinear least-squares output, 
Gq, and choose a value I < d for the number of nearest 
neighbors for local averaging. 

Next, we enter an iteration loop, where in the i-th step 
the dataset 



= VSTL(X/;cr,N,_i) 



(A8) 



is computed, and the algorithm in Table |IV| executed 
using Mi as input data. The resulting quaternion esti- 
mates, nearest-neighbor indices, and least-squares resid- 
ual are respectively designated 7i, N^, and G* ■ Note that 
the residual Gi is a measure of the difference between 
the eigenfunctions obtained by embedding and the natu- 
ral eigenfunctions (Wigner /^-functions) expected on the 
basis of symmetry (see Paper I). 

If, after an iteration i, Gi is larger than the resid- 
ual Gi-i encountered in the previous step, the loop is 
terminated. Otherwise, the iteration is repeated using 
Ni as an updated estimate of the true nearest-neighbor 
indices. Our final orientation (quaternion) assignment 
is the one corresponding to the minimum least-squares 
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TABLE IV. Orientation-recovery using a self-tuning kernel 

Inputs: 

Snapshots Ai — {o^ , ■ ■ ■ , Oig} 
Number of retained nearest neighbors d 
Number of datapoints in the least-squares fit, r 
Number of nearest neighbors for autotuning, n 

Outputs: 

Estimated quaternions 7~ — {ri , . . . , Tg}, 
Nearest-neighbor index matrix N 
Least-squares residual Q* 

1: Compute the s X d matrices N and S such that 

Nij — index of j-th nearest neighbor to snapshot a^, 
Sii = Ho, - a... , 11. 



2: return N 

3: Rescalc the distance data by the n-th nearest neighbors: 



The empirical evidence in Sec. IV A clearly shows that, 



Sij 4— ^5i,iN 



,1/2 



4: Compute the sparse transition probability matrix P using the algo- 
rithm in Table [V| with inputs S, N, e, and a — 1. 

5: Solve the sparse eigenvalue problem PV'^, — '^fc^^ ^'^^ < < 9 
and 1 — Ao<Ai<---<Ag. 

6: Solve the nonlinear least-squares problem 



given a sufficiently large number of sample points, the 
scheme, applied only a handful of times and terminated 
using the value of G* as a criterion, provides noise re- 
duction sufficient for accurate orientation recovery at 
MPC = 0(10-2). 

A potentially fruitful way of interpreting mathemati- 
cally the success of the process (which lies outside the 
scope of the present paper) would be to explore its con- 
nections with mutually reinforcing models (MRMs) for 
graph filtering [46) . This type of model involves iter- 
atively replacing vertices of graphs with weighted aver- 
ages, whereby the vertex itself and its local neighborhood 
exert an influence on the vertex in the course of iterative 
updates. In certain applications, the iterative process in 
an MRM is terminated after only a small number of it- 
erations. Both of these two features are present in the 
scheme presented here. 



Appendix B: Computational resources 



G{{cijk}) = ^IIRi'Ri - if + I dot(Ri) - 1|" with [R,]ij = dj^iPii 



return G* 

for i — 1, . . . , s do 

Compute an approximate SO (3) matrix for snapshot 
Project Ri to an orthogonal matrix 
Convert R; to a unit quaternion 
return 
end for 



TABLE V. Calculation of the sparse transition probability 
matrix in Diffusion Map, reproduced from Paper I for con- 
venience. 

Inputs: 

s X d distance matrix S 

s X d nearest-neighbor index matrix N 

Gaussian width e 

Normalization parameter a 

Outputs: 

s X s sparse transition probability matrix P 
1: Construct an s x s sparse symmetric weight matrix W, such that 



1, ifi^J, 

Wji, if Wij / 0, 

0, otherwise. 



2: Evaluate the s x s diagonal matrix Q with nonzero elements Qa — 

3: Form the anisotropic kernel matrix K — Q "WQ 

4: Evaluate the s x s diagonal matrix D with nonzero elements Da — 

5: return = D^^K 

residual, reached in the iteration prior to the termina- 
tion step. 



The calculations reported in this work were primarily 
performed on a Rocks cluster with 30 nodes, each con- 
sisting of two 2.5 GHz Quad-Core Intel Xeon CPUs with 
16 GB RAM. Algorithms were usually implemented in 
MATLAB r2009b with the Parallel Computing Toolbox 
together with the MATLAB Distributed Server using up to 
120 workers (parallel processes). For less intensive calcu- 
lations, a Linux workstation with a 2.66 GHz Quad-Core 
Intel Xeon CPU, 32 GB RAM and/or a Mac Pro 2 x 2.8 
GHz Quad-Core Intel Xeon CPUs, 10 GB RAM were 
used. In Diffusion Map, by far the most CPU-intensive 
calculations are: (1) the determination of the Euclidean 
distances of snapshots; and (2) setting up of the sparse 
distance matrix of nearest neighbors. These calculations 
were performed in parallel using 100 workers. Such a 
distance calculation involving 2 x 10^ snapshots typically 
takes 7 hours for chignolin and 48 hours for adenylate 
kinase (ADK) in Paper I. Other calculations, including 
the eigenvector determination and the estimation of the 
orientation matrices were performed on the Linux work- 
station in about 8 hours altogether. In total, the orienta- 
tion determination for 2 x 10^ snapshots requires 56 hours 
for noise-free ADK and 33 hours for noisy chignolin with 
5 local-averaging iterations. Compiling a 3D diffraction 
volume consisting of a uniform Cartesian grid was imple- 
mented in parallel code, with an execution time of less 
than three hours for two million ADK diffraction snap- 
shots using 80 workers. Diffraction patterns and cryo-EM 
images were simulated on the cluster and on the Mac Pro. 
The GTM and phasing algorithms were performed on the 
Linux workstation and/or the Mac Pro. The chimera 
package [47j was used to visualize electron density maps. 
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